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Abstract 


The intrinsic structure of binary fields poses a challenging complexity 
problem from both hardware and software point of view. Motivated by 
applications to modern cryptography, we describe some simple techniques 
aimed at performing computations over binary fields using systems with 
limited resources. This is particularly important when such computations 
must be carried out by means of very small and simple machines. The 
algorithms described in the present paper provide an increased efficiency 
in computations, when compared to the previously known algorithms for 
the arithmetic over prime fields. 
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1 Introduction 

From the introduction of public key chryptography, numerous papers dealing 
with the problem of constructing efficient algorithms for the arithmetics of finite 
fields were published. With this respect, a vast amount of research has been 
carried out for Elliptic Curve Cryptography (ECC), [B]. 

Recently, cryptosystems have been increasingly used in machines with very 
limited resources, like for instance smart cards, microchips and microcontrollers. 
This posed the problem of finding fast and efficient algorithms for field arith¬ 
metics when computations are to be performed by such simple devices. 

The NISd0 gave the recommendations for the selection of the underly¬ 
ing finite fields and elliptic curves. The latest revision of these standards 
was made available in the publication called FIPS 186-3 |1]. This publica¬ 
tion recommended 5 prime fields Fp, with p chosen among the following primes: 
P192 = 2192 _ 264 _ ^ 2224 _ 396 ^ 3266 _ 2224 3192 396 _ 

P384 = 2^64 _ 2126 — 2 ®® -|- 2^2 — ^521 = 2^21 — plus 5 binary fields: F 2163 , 

F 2233 , F 2283 , F 2409 and F 2571 . The NIST also gave detailed instructions on the 
use of elliptic curves over such finite fields. 

Below we describe briefly some standard algorithms for the arithmetic of 
prime fields [3]. 

^National Institute of Standards and Technology. 
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The primes p for the prime fields are chosen with a bitsize divisible by 32. 
Further, p must be either a Mersenne prime of the form p = 2" — 1, or a pseudo- 
Mersenne prime of the form p = 2" — r with the smallest possible integer r. 
We assume that the implementation platform has an L-bit architecture, with 
Lg {8,16,32,64}. Let t = [log 2 p] and m = [t/L], where [x] denotes the least 
integer k such that k > x; the elements of prime fields are the integers between 0 
and p—1 stored in software in an array of m L-bit words: a = (oq, oi,..., Om-i). 

These primes allow an efficient modular reduction by using the replacement 
a 2 " = ar (mod p), repeating it as necessary until the equivalent number modulo 
p is obtained. 

Let a = (oo, oi,..., am-i) and b = {bo, bi,..., bm-i) be two elements of a 
prime field Fp. The addition is carried out by first finding the sum word by 
word and then reducing it modulo p. The modular addition is implemented by 
using the classic algorithm “add with carry”, and the modular subtraction is 
implemented in a similar fashion where the carry is interpreted as a “borrow”. 

The multiplication is carried out by using the classic “product term by term”, 
interpreted as “product word by word”, and then reducing it modulo p. We 
observe that, during the computation, we can easily represent each terms atbj = 
So + Si 2 ^ still by the L-bit words (sq, si). 

The inverse of a non zero field element a G {1,2,...,p — 1} is carried out by 
using a variant of the Extended Euclidean Algorithm. The algorithm maintains 
the invariants Aa + dp = u and Ca + ep = v for some d and e which are not 
explicitly computed. The algorithm terminates when m = 0, in which case u = 1, 
and Ca + ep = 1, hence C = a~^ (mod p). Then, the division is carried out as 
a/b = ab~^. 

We have developed similar algorithms for binary fields in limited systems— 
whose small efficiency requires simple techniques—for the representation of bit 
sequences by suitable integers, with the property that addition and subtraction 
are the same, and with equality 1 + 1 = 0 . 

In this paper we describe some simple algorithms that are designed to work 
with the arithmetic of the binary fields in limited systems such as microcon¬ 
trollers, smart cards, etc. These algorithms are presented in form of pseudo¬ 
code. 

2 Arithmetic on binary fields and algorithms 

In a hardware circuit the data is represented by logical signals {0,1} and it 
uses the arithmetic of t-bits binary sequences. Therefore, the most appropriate 
choice for a finite field is GE(2‘). We have the following isomorphism: 

GE(2‘) ~ GF(2)[x]/p(x) 


where 


t-i 

p(x) = X* +r{x) = X* + '^pix\ p = (po,Pi, • ■ • ,Pt-i, 1) e GF(2*+^) 
2=0 
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is an irreducible polynomial of degree t over GF(2). Using this isomorphism, 
the operations between t-bits binary sequences are identified with the operations 
between polynomials of degree t — 1 modulo p{x). 

To optimize the use of hardware memory, we can represent any sequence 
of L bits with an unsigned integer between 0 and 2^ — 1. More precisely, an 
element of GF(2‘) corresponds to to = It/L] unsigned integers. Then, using 
an appropriate representation of binary numbers as integers, we are able to 
access the bits representing the coefficients of the polynomials with appropriate 
functions and statements in terms of integers. 

Let d be the difference between t and the degree of the polynomial r{x). 
For practical reasons, polynomials r(x) with few terms and degree as small as 
possible are preferable. One can use irreducible polynomials with three or five 
terms (trinomials and pentanomials, respectively) and such that 2d ^ t — 1. 

The existence and the properties of certain irreducible polynomials, such as 
trinomials and pentanomials over GF(2), have been extensively investigated for 
at least 40 years following the paper of R.A. Shwan HU. The relevant contribu¬ 
tions prior to 1983 are surveyed in [S]; see Chapter 3, Notes 5. Recent references 
on irreducible polynomials with few terms are [D H [g H [To]. In particular, a 
theorem due to Swan |12j implies that irreducible trinomials do not exist for 
t = 0 (mod 8). Furthermore, it follows from a result due to Bluher [U that they 
are rare when t = ± 3 (mod 8); this fact originates from observations on trino¬ 
mials and pentanomials arising from computations of Ahmadi and Menezes [T] : 
If < = ± 3 (mod 8) and f{x) = ^ GF[x] is an irreducible monic poly¬ 

nomial of degree t such that Tr(ai) = 0 for each i with 1 = i < t, then / contains 
a term x^ with t > k > t/3 and k = t — 2 (mod 4). In particular, this shows 
for irreducible trinomials that the degree of the second term cannot be chosen 
to be of small. 

When an irreducible trinomial of degree t does not exist, the next best 
choice is a pentanomial. Usually, the polynomials are generated by determinis¬ 
tic irreducibility tests using computer computing, and a table of trinomials or 
pentanomials is available for 2 ^ ^ 10000 in HU. 

We can write r(x) = -I- + r 2 X^^ + r^x'^^ with two zero terms in 

case of trinomials. 

2.1 Addition 

The addition of polynomials corresponds to the logical XOR operation, also 
called exclusive or, between bits of their corresponding binary sequences. Gen¬ 
erally, programming languages for microcontrollers provide the XOR operator 
for the integers. 

AlgorithmjTjcomputes the sum of two elements of GF(2‘) with computational 
complexity 0{m). The symbol stands for the binary operator XOR of 
unsigned integers. 
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Algorithm 1 Addition in GF(2)[a;]/p(x) 

Require: a = (ao,ai,.. .,am-i),b= {bo,bi,.. .,bm-i), ai,bi e [0,2-^ - 1] 
Ensure: o + 6 = c = (cq, ci,..., Cm-i), Ci € [0, 2^ — 1] 
for i = 0 to TO — 1 do 

Ci — di bi 

end for 


2.2 Reduction modulo p{x) 

Let a{x) = X]i=o ^ polynomial of degree s, with t ^ s ^ 2t — 2, 

represented by the binary sequence (oq: Q;i 7 • • •,««) with ai € GF(2). 

Let I = (ooj oi, ■ • ■, cxt-i) and h = (at,at+i ,..., a 2 t-i), where = 0 for 
s + l^z^2t — 1, then we can write the polynomial as a = I + hx*. 

Since x* = r{x) (mod p(x)), we carry out the reduction of a{x) modulo p{x) 
using the following: 

• the equivalence 

a = I + hr{x) = I + rohx^° + rihx'’^ + r 2 hx’‘^ + r^hx"^^ (mod p{x))-, 

• the operations “/(a;) <C i” and “/(x) ^ i”, which are the respective 
equivalents of shifting up and down i positions in the binary sequence of 
the polynomial f(x). 


Algorithm 2 Reduction modulo p in GF(2)[x] 

Require: a = (ao, cq, ■ ■ ■, cts), ai € GF(2), s ^ 2t — 2 
Ensure: a (mod p{x)) 

I = (ccq, ai,..., at-i) 

h = {at,at+i ,..., a2t-i), with 0 ^= 0 , s+l<z<2t—1 

while degree{a) ^ t do 
a = I, g = h io 
for z = 0 to 3 do 

if Ti = 1 then 

a = a + g 

end if 

if z < 3 then 
g = g <. (zi+i - ii) 

end if 
end for 
end while 


When we shift a binary sequence by z bits up or down, the ones into upmost 
or downmost i bits, respectively, are lost. Our algorithms must guarantee that 
none of the ones are being shifted into oblivion, in order to assert that 

[/(x) • X*] = [/(x) < z] and [/(x)/x*] = [/(x) > z]. 
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When a polynomial a{x) has degree greater than t — 1, we can delete the 
terms of degree greater than < — 1 by using the equivalence a = I + hr (x) 
(mod p{x)) and repeating it if necessary. Since 2d ^ t — 1, we need to iterate 
this operation no more than twice. So, we obtain Algorithm [2J which has 
computational complexity 0{km), with fc = 4 or fc = 8 according as p{x) is a 
trinomial or a pentanomial. 


2.3 Square 

Since GF(2) is a field of characteristic 2, the following equality holds 





Algorithm 3 Square in GF(2)[x]/p(a;) 

Require: a = (oq) ai,..., Ot-i), Oi G GF(2) 

Ensure: (mod p{x)) 

temporary variable: b = (/3o, /3i, • • •, /32t-2), A G GF(2) 
for i = 0 to t — 2 do 

(^2i — 

1 ^ 21+1 = 0 
end for 
P2t-2 = Oit-1 
0 ? = h (mod p{x)) 


Therefore, we can compute the square of a polynomial simply by doubling its 
indices and then performing the reduction modulo p{x). We obtain Algorithm[3l 
whose main computational cost is due to reduction. 

2.4 Product 

Let a,b € GF(2)[a:] be two polynomials, with 



Since [b ■ a;*] = [6 <?; i] the product between a and b is 


t-i 


t-i 


t-i 


a 


■ b = ^ OiX* • b = ^ ■ (bx^) = ^ tti ■ (b i) 


which has computational complexity 0(t) plus t shifts and the reduction’s cost. 
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But, we can perform the product faster as follows. Let a = [uq, ai,..., at-i) G 
GF(2)‘, b = {bo, bi,..., bm-i) G [0, 2^ — 1]"*, w = \2t/L], and define the op¬ 
eration Sh{b,i) = 6 <C iL. We note that Sh{b,i) = (sq, si,..., where 

Sj = bj-i, if j G [i, m — 1 -I- i] and Sj = 0 otherwise. 

By using the operation Sh, we only need to do L shift operations, instead 
of t, in this way: 


L — 1 C m — 1 

a ■ ^ = X! i X! 

e—0 I i—0 


[S'/i((6 <C e), i)] 


Algorithm 4 Product in GF(2)[a;]/p(a:) 

Require: a = (ao,ai,... ,at-i) G GF(2*), b = {bo,bi,.. .,bm-i) G [0,2^-!]™ 
Ensure: a ■ b (mod p{x)) 

temporary variables: c,d G [0, 2^ — 1]“' 
c = (co, Cl,..., Cw-i), with Ci = 0, 0 ^ i ^ w — 1 

d = {do, di,..., dyj-i), with di = bi, 1, and di = 0, m i ^ 

w — 1 

for e = 0 to L — 1 do 
for * = 0 to TO — 1 do 
if aiL+e = 1 then 
for j = i to w — 1 do 

Cj — Cj “t” dj — i 

end for 
end if 
end for 
d = d 
end for 

a ■ b = c (mod p{x)) 


We have Algorithm SI which has computational complexity 0{Lm) ^ 0{t) 
plus L shifts and the reduction’s cost. 

2.5 Inversion and division 

To compute the inverse of polynomials we use a variant of the classical 
Euclidean algorithm. We can carry out the division between two polynomials 
by multiplying the first one by the inverse of the second one. 

Let a{x) and b{x) be two polynomials in GF(2)[a;]. Then, gcd(a, b) = gcd(& — 
ca,a) for all polynomials c G GF(2)[a:]. If deg(6) > deg(a) and j = deg(&) — 
deg(a), we can compute r = b + x^a and hold gcd(a, b) = gcd(r, a). 

With this variant, we can use the extended Euclidean algorithm and obtain 
Algorithm [5] which has computational complexity 0(4tTO), see [4]. 
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Algorithm 5 Inversion in GF(2)[a;]/p(a:) 

Require: a = (oq, oi,..., Ot-i) 7 ^ 0, G GF(2) 

Ensure: a~^ (mod p{x)) 

temporary variables in GF(2*): u = a, v = p, gi = 1, 32 = 0 

while degree{u) 7 ^ 0 do 
j = degree{u) — degree{v) 

if j < 0 then 

swap{u,v), swap{gi,g 2 ), j = -j 

end if 

u = u + {v j) 

gi=9i + (92 <. j) 

end while 

= 9i 


3 Tests performed 

We tested these algorithms on a commercially available and very cheap 
board. Such a board, called Arduino"'"'^ Duemilanov^, has computing power 
similar to smart cards and has the following features: 

• ATmegal 68 microcontrolleJl ; 

• 16 KB (available 14 KB) in system self-programmable flash memory; 

• 1 KB SRAM and 512 Bytes EEPROM; 

• 16 MHz clock speed; 

• language based on C/C+-I-; 

• standard serial communication. 

Below, we show the most significant results obtained on the 5 binary fields 
that NIST recommended in the publication FIPS 186-3, with following polyno¬ 
mial basis representation: 

• F 2163 = GF(2)[a:]/(x^®^ + x'^ + + 1), 

• F 2233 = GF(2)[x]/(x 233 -h -f 1), 

• F 2283 = GF(2)[x]/(x^®® -I- x^^ -I- x’’ -b X® -I- 1), 

• F 2409 = GF(2)[x]/(x 409 -b x®'^ -b 1 ), 

• F 2571 = GF(2)[x]/(x®^i -b x^o -b X® + x2 -b 1). 


^http://www.arduino.cc/ 

®Low Power AVR® Microcontroller manufactured by ATMEL®. 
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Degree of field 

163 

233 

283 

409 

571 

multiplication on binary fields 

16 

29 

40 

80 

149 

inversion on binary fields 

60 

105 

145 

282 

505 


Table 1: Execution times on binary fields (in ms) 


Degree of field 

192 

224 

256 

384 

521 

multiplication on prime fields 

6 

7 

9 

18 

29 

inversion on prime fields 

234 

344 

490 

1442 

3258 


Table 2: Execution times on prime fields (in ms) 


In order to do a comparison, we have also implemented the algorithms on 
the NIST prime fields shown in the Introduction [T] In Tables [T] and [2l we put 
the execution times to multiply and invert on the NIST binary fields and on 
the NIST prime fields respectively. In Figure [T] we provide a visual comparison 
between the execution times on binary fields and prime fields. 

♦ inversion —■— multiplication 



Figure 1: Time comparisons, L = 16 bits. 


4 Conclusion 

In this paper, we presented an implementation of the arithmetic in GF(2*) 
with basic polynomial, using straightforward algorithms with low use of memory. 
The algorithms we used are as generic as possible, so we can easily change 
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the parameters and the underlying field GF(2*). For their flexibility, these 
algorithms can be used in systems with limited computing resources. 

From the comparison between the execution times, we observe that the mul¬ 
tiplication on prime fields requires an execution time which is shorter than on 
binary fields, while the operation of inversion on prime fields has an execution 
time much larger than on binary fields, and this grows very rapidly. 

Furthermore, we can observe that our algorithms proved to be very efficient 
and particularly suitable for small devices and tasks which require the use of 
arithmetic inversions. 
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